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1 Introduction 



Reconstructing the Tree of Life, i.e. the phylogenetic tree displaying all living 
species on earth is one of the main challenges of biological sciences to-date. 
The genetic sequence data on some clusters of species are already available 
in databases like GenBank or SwissProt, and there are algorithms available 
to reconstruct the tree of each cluster. In many studies, data from different 
loci are combined by building trees from each locus and combining trees to a 
so-called 'supertree'. In this setting, it is common that the supertree contains 
all taxa whereas the input trees for each individual locus oftentimes do not 
contain all taxa under consideration. While it is even possible that these input 
trees are incompatible with one another (which then makes it impossible to 
find a perfect supertree, i.e. a supertree displaying all the input trees); even 
in the case of compatibility, it is not always clear which supertree is best as 
there may be more than one. 



In a recent study, Steel and Sanderson ( 2010} mathematically character 



ized so-called phylogenetically decisive sets of taxon sets. These sets consist 
of input taxon sets which have the property that all possible compatible in- 
put trees chosen for the input sets lead to a unique supertree. This is an in- 
teresting setting, as it shows that the decision which taxa to sample for each 
input tree may already ensure that the supertree of all input trees is unique. 
In this context, a set of taxon set which is phylogenetically decisive can also 
be referred to as a set of perfect taxon samples. However, the complexity of 
deciding whether or not a given set of taxon sets is phylogenetically decisive 
remained unknown. 

Problem 1 (Phylogenetic decisiveness decision problem) Given a taxon set 
X with |X| = n and a set S = { Y\, . . . , Yj-} of subsets of X, what is the com- 
plexity of determining whether S is phylogenetically decisive or not? 

Problem [T] was listed on the 2012 'Penny Ante' prize list of the Annual 
New Zealand Phylogenetics Meeting ( |Steel ( 2011[ |), and it is the aim of this 



paper to present a solution to the problem. 

In their study, Steel and Sanderson showed that a set of taxon sets is phy- 
logenetically decisive if and only if it satisfies the so-called four-way partition 
property. However, this characterization does unfortunately not provide an 
answer to Problem[IJ as checking the four-way partition property is not a pri- 
ori efficient. In our paper, we extend the work of Steel and Sanderson. First, 
while the work of Steel and Sanderson focuses on the unrooted tree case, i.e. 
the case where all input trees and the supertree are unrooted phylogenetic 
trees, we additionally consider the rooted case. Moreover, we show that in 
both cases, for a given set of taxon sets it can be decided in polynomial time 
if this set is phylogenetically decisive as long as the number of taxon sets un- 
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der consideration is polynomial in the number of taxa. We present algorithms 
for both cases, which make explicit use of some characterizing properties we 
derive in this work, as well as some worst-case runtime bounds for these al- 
gorithms. 

More importantly, our algorithm for the unrooted tree case is not only use- 
ful for checking whether or not a set of taxon sets is phylogenetically decisive 
- which would imply that all possible quartets of four taxa get uniquely re- 
solved when the input trees are combined into a supertree. This would be the 
perfect case which may not occur very often in practice. However, oftentimes 
it is not necessary to have a unique supertree but rather to find out the rela- 
tionship (and thus the unique subtree common to all supertrees, if it exists) 
of a particular quartet which is not already resolved by any of the input trees. 
In terms of our algorithm, this would mean that only this particular quartet 
would need to be examined - i.e. one iteration would be needed. This makes 
our approach useful for phylogeneticists, as even if the set of taxon sets they 
are investigating is not phylogenetically decisive, it may still bear some use- 
ful new information which can be determined this way. 



2 Notation and Model Assumptions 

We first introduce some notations and definitions required for presenting our 
results. We start with the standard notion of phylogenetic trees both for the 
rooted and the unrooted case. 

Definition 1 A rooted phylogenetic X-tree T is a connected, acyclic graph with 
one node of degree 2, namely the so-called root, all other internal nodes of 
degree 3 and |X| = n nodes of degree 1, namely the so-called leaves or taxa. 
Whenever there is no ambiguity, we refer to a rooted phylogenetic X-tree as 
rooted tree for short. 

Definition 2 An unrooted phylogenetic X-tree T is a connected, acyclic graph 
where all internal nodes are of degree 3 where there are and | X | = n nodes of 
degree 1, namely the so-called leaves or taxa. Whenever there is no ambiguity, 
we refer to an unrooted phylogenetic X-tree as unrooted tree for short. 

We are now in a position to define phylogenetic decisiveness. 

Definition 3 A collection S = {Y lr . . . , Y k } of subsets of X (i.e. Y, C X Vz = 
1, . . . ,k) is said to be phylogenetically decisive for rooted or unrooted trees, re- 
spectively, if for every rooted or unrooted binary phylogenetic X-tree T, the 
collection T| y ; : Y, ; € S defines T (i.e. T is the only tree that displays these 
trees). 
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In order to work with phylogenetic decisiveness, we require some con- 
cepts which are similarly used in the context of phylogenetic groves, e.g. by 
Ane et al|p)09) and |Fischer|p01l) . 



Definition 4 Let rt — S\\Sz\ . . ■ \S m be a partition of S = {Y\, . . . , Y/ c } (where 
¥i Q X Vz = l,...,k), for some m < k. A set {x,y,z} such that x,y,z 6 X 
and {x,y,z} % £(S,-) for all z = 1, . . . , nz, where £(S/) = U; : y,es, Yj> * s called 
a cross triple ofS with respect to n or CT icrf 7T for short. 



Definition 5 Let n = S\ | S2 1 . . . \S m be a partition of 5 = {Yj, . . . , Y/ c } (where 
Y, C X Vi = 1, . . . , A:), for some m < k. A set {a, b, c, d} such that a,b,c,d £ X 
and {«, fr, c, d} % £(S ( ) for all f = 1, . . . , m, where £(S,) = U/tY-eSj i s called 
a cross quadruple ofS with respect to n or CQ cyrf 7T for short. 



Definition 6 Let S = Y\, . . . , Y^ be a set of taxon sets and let 7r = Sx| . . . \S m 
be a partition of 5. Let {x,y,z} be a cross triple of S with respect to n (or 
{«, b, c, d} a cross quadruple of S with respect to 7i). Then, {x, y, z} (or {a, b, c, d}, 
respectively) is called resolved if there is a choice of rooted (or unrooted, re- 
spectively) trees on Y,, i = 1, k, such that all possible supertrees of these 
trees display the same of the three possible trees on {x,y,z} (or {a,b,c,d}, 
respectively). 



These few notions suffice to derive the desired results. 



3 Results 



3.1 Rooted tree case 



First, we present the results for the case where all trees are thought of as being 
rooted. It turns out that this case is much simpler than the unrooted case, 
which is analyzed lateron. Moreover, the rooted case can be characterized in 
such a simple way that the decision whether a particular set of taxon sets is 
phylogenetically decisive can be reduced to a search for cross triples. This 
also leads to a new characterization of phylogenetically decisiveness. 

We first state the main result, which then is proven subsequently. 

Theorem 1 (Main theorem 1) Given a set S = {Y\, ...,Y^} of subsets of X, 
where |X| = n,the question whether S is phylogenetically decisive can be answered 
in at most 0(k-n 3 ) steps. In particular, as long as k is polynomial in n, the question 
can be answered in polynomial time. 
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In order to prove the above theorem, we need to prove some useful prop- 
erties concerning cross triples first. 

Proposition 1 Let S = {Y\, . . . , Y^} be phylogenetically decisive set of subsets of 
X. Let nbea partition ofS. Then, there are no cross triples of S with respect to n. 

Proof Assume there is a CT {x,y,z} of S wrt n. We now construct k trees 
T\, . . ., Tj; as follows: If Y; contains only one of the elements x, y or z, we 
choose Tj such that this element is directly connected to the root, say on the 
right-hand side of T, and all other taxa of Y, are contained in the left-hand 
side of Tj. If Y ; contains two of the elements x, y or z, we choose T, such that 
these two elements form a cherry which is directly connected to the root, say 
on the right-hand side of T ; and all other taxa of Y, are contained in the left- 
hand side of Tj. Note that no Y, can contain all three elements x, wand z as 
{x,y,z} is a CT. The construction of Tj is depicted in Figures [l]and(2] 



T 




other taxa of Y^ x 



Fig. 1: Construction of T, for the case where x € Y, and y,z ^ Yj. 

Note that we do not specify the rest of the left-hand side subtrees of T, or 
the trees T, for those subsets Y, which do not contain x, y or z. They can all be 
chosen arbitrarly as long as one makes sure that all T, are compatible. 

Now since the T; are all compatible, we can combine them into a supertree 
T . However, by construction, T cannot be unique as the relationship of x, 
y and z cannot be resolved: no T, contains all of them, and those T, which 
contain up two of them just bear the information that they are equally unre- 
lated to the other taxa. Thus, {x, y, z} is not resolved and therefore there is no 
unique supertree of the T ( . Therefore, T is not the only tree that displays the 
T;. An example of two trees T and T' displaying the T, as constructed above 
is depicted in Figure [3] 
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T 




other taxa of Yj 



y 



Fig. 2: Construction of T\ for the case where x, y € Y, and z ^ Yj. 



T 



T 7 





other taxa of X x V 



Z 



other taxa of X 



Fig. 3: Two possible supertrees for the 7} showing two different resolutions of the cross triple {x,y,z}. 

The fact that T is not the only tree displaying the 7* contradicts the fact 
that S is phylogenetically decisive. This completes the proof. 



Next, we state a sufficient criterion for phylogenetic decisiveness. 

Proposition 2 Let S = {Y\, . . . , Y^} be a set of subsets of X such that there is no 
cross triple with respect to any partition ofS. Then, S is phylogenetically decisive. 

Proof Let S = {Y\, . . . , Y^} be such that there is no cross triple with respect 
to any partition of S. Assume S is not phylogenetically decisive. Then there 
exists a choice of trees Ty . . ., on Y\, . . . , Y^, such that these trees are rep- 
resented by at least two different supertrees 1\ and Ty_. Since these two trees 
are different, they resolve at least one triple {x,y,z} of taxa in X differently, 
and thus in particular this triple {x, y, z} is not resolved by any of the input 
trees T\, . . . , T^. But as there is no CT wrt n = Y\\Y2 \ . . . | Y^, i.e. the partition 
which splits all elements of S apart, by Definition |4j all triples are subset of 



□ 
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at least one Yj. So {x, y, z} C Y,- for some i 6 {1, . . . , A:}, and therefore {x, y, z} 
is resolved by T,. This is a contradiction and thus completes the proof. 

□ 

Now we show a useful property of cross triples, which is needed in the 
following. 

Propositions Let S = {Y\, . . . ,Yjt} he a set of taxon sets with Y, C X for all 
i = 1, . . . , k. Let 7i = Y\ | Y2 1 . . . | Y k be the partition of S which splits all elements 
of S apart. Then, if there is no cross triple of S with respect to n, there are no cross 
triples with respect to any other partitions ofS, either. 

Proof Assume there is no cross triple of S with respect to n = Yi|Yj| . . . \Y k . 
Let <p = S1IS2I ■ ■ ■ \S m be a partition of S with m < k, i.e. each Yj belongs to 
exactly one S,, but each S; may contain more than one Yj. Now suppose there 
is a CT {x, y, z} of S wrt (p. By Definition|4j this implies that no S; contains x, y 
and z together. Thus, in particular no Y; can contain x, y and z together. Then, 
again by Definition |3J if there is no Yj which contains x, y and z together, this 
implies that {x,y,z}isaCT with respect to n — Y\ \ Y2 1 ... | . This contradicts 
the assumption and thus completes the proof. 

□ 

We are now in a position to state a novel characterization of phylogenetic 
decisiveness in terms of the following corollary. 

Corollary 1 A set S = {Y\, . . . , Y*} of taxon sets with Yj C Xfor all i = 1, ... ,k 
is phylogenetically decisive if and only if there is no cross triple of S with respect to 
n = Y 1 \Y 2 \...\Y k . 

Proof 

1. Let S — \Y\,. . .,Yjt} be a set of taxon sets with Y, C X for all i = 
1, . . . ,k which is phylogenetically decisive. Then, by Proposition [lj there 
are no cross triples of S with respect to any partition of S. This implies 
in particular that there is no cross triple with respect to the partition n = 

Y r \Y 2 \...\Y k . 

2. Let S = {Y\, . . . , Y/ c } be a set of taxon sets with Y t C X for all i = l,...,k 
such that there is no cross triple of S with respect to the partition n = 
Y1JY2I . . . |Yjt which splits all elements of S apart. By Proposition |3| this 
implies that there is no cross triple with respect to any partition of S. 
By Proposition |2j it follows that S is phylogenetically decisive. This com- 
pletes the proof. 
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□ 

Now we can prove the main theorem of this section. In the proof, we also 
provide an algorithm for solving Problem [T] 

Proof (Main Theorem 1) Let S = {Y\, . . . , Y^} be a set of subsets of X, where 
|X| = n. We want to decide if S is phylogenetically decisive. By Corollary 
[T] this means we have to check if there is a cross triple of S with respect to 
n = Y\ | Y2I . . . I Yfc. We now provide an (D(k ■ n 5 ) algorithm. 

We assume an arbitrary ordering of (^) = {Y : Y C X and |Y| = 3} and 
we continue as follows: 

BEGIN 

Initialization: Marker(z') := for all i — 1, . . . ,k. 

For / = 1 I (f ) I = l(n 3 -3n 2 + 2n) 

For / = l,...,k 

Check if element i of (3) is a subset of Ys. If yes: Marker(z) :— 1. 
If Marker(z) == 0: STOP. S is not phylogenetically decisive. 
If Marker(z) == 1 for all i = 1, . . .,k: S is phylogenetically decisive. 
END 

So the algorithm checks for all subtriples of X if it is a cross triple (in this 
case, its marker remains 0) or not (in this case, the marker is changed to 1). 
If a cross triple is found, the algorithm stops immediatly as then S is not 
phylogenetically decisive. If, on the other hand, after checking all triples no 
cross triple is found, S is phylogenetically decisive. Note that the checking 
step can be done with optimal hash tables in 0(1), i.e. constant time (where 
the hash table build-up has complexity C(|Yy|)). Now as this is repeated k ■ 
5 (n 3 — 3n 2 + In) times, the overall time needed is bounded by 0(k ■ n 3 ). This 
completes the proof. 

□ 

Note that Theorem [l] can also be proven in a different way by using a 
result on the unrooted tree case by Steel and Sanderson ? and regarding the 
root as an outgroup taxon. This idea is further explained in Remark [T] 

Now we provide a simple example for a phylogenetically decisive set of 
taxon sets in the rooted sense. 
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Example 1 Let X = {1,2,3,4}. Then, by Main Theorem 1, the set 
S := {{1,2,3}, {1,2,4}, {1,3,4}, {2,3,4}} is phylogenetically decisive, be- 
cause it contains all possible triplet subsets of X and thus there is no cross 
triple with respect to any partition. 

Note that Example [T] shows that if there is no cross triple with respect 
to any partition, this does not imply that S has to contain X. The latter case 
would be a trivial case, because then also a supertree of the trees for all Y, E S 
would equal the tree provided for X. In this respect, Example [T] shows that 
there are non-trivial ways to keep cross triples out of taxon sample set and 
thus to ensure that all collections compatible input trees will lead to a unique 
supertree. 

In the following, we consider the case of unrooted trees, which is a bit 
more complex than the rooted case. 



3.2 Unrooted tree case 

By considering Example [l] one can easily see that a set S that is phylogenet- 
ically decisive for the case where the input tree topologies chosen are rooted 
need not be phylogenetically decisive for unrooted input tree topologies. In 
the above example, all sets in S contain only three taxa, which implies for the 
unrooted case that all chosen input tree topologies can only be star trees and 
thus cannot lead to a unique supertree (as star trees are compatible with all 
possible supertrees). So the concept of phylogenetic decisiveness cannot be 
directly transferred from rooted to unrooted trees. However, we now show 
that Proposition [2] can be generalized to the unrooted case when regarding 
quadruples instead of triples. 

Proposition 4 Let S = {Y\, . . . , Yj-} be a set of subsets of X such that there is 
no cross quadruple with respect to any partition of S. Then, S is phylogenetically 
decisive. 

Proof Let S = {Y\, . . . , Y; c } be such that there is no cross quadruple with re- 
spect to any partition of S. Assume S is not phylogenetically decisive. Then 
there exists a choice of trees T\, . . . , on Y\, . . ., Y/ c , such that these trees are 
represented by at least two different supertrees T\ and Ti- Since these two 
trees are different, they resolve at least one quadruple {a, b, c, d} of taxa in X 
differently, and thus in particular this quadruple {a,b,c,d} is not resolved 
by any of the input trees T\, . . . , T^. But as there is no CQ wrt the parti- 
tion 7r = Y\ | Y2 1 . . . I Yfc, which splits all elements of S apart, by Definition 
[5] all quadruples are subset of at least one Y,. So {a, b, c,d} C Y, for some 
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i E {1, . . . ,k}, and therefore {a, b, c, d} is resolved by T,. This is a contradic- 
tion and thus completes the proof. 



Note that for the unrooted case, a characterization of phylogenetic de- 
cisiveness was already given by |Steel and Sanderson| < |2010| in terms of the 
following theorem. 



Theorem 2 {Steel and Sanderson |2010 1, 'Four-way partition property') 



A collection S = { Y\, ... , Y^j of subsets of a taxon set X is phylogenetically decisive 
(in the unrooted sense) if and only if it satisfies the four-way partition property, i.e. 
if for all partitions n = X\ IX2IX3IX4 of X into four non-empty subsets, there exist 
taxa Xi E Xj (for i = 1,2,3,4) such that the quadruple {x\,x^,x^,x^\ is contained 
in Yjfor some j E {!,... ,k}. 

However, if one wants to decide if a particular set of taxon sets is phylo- 
genetically decisive, checking all partitions of the set into four subsets is com- 
putationally expensive. Therefore, we subsequently develop another charac- 
terization. 

Remark 1 Theorem [2] gives rise to a different view on the rooted tree case. 
Given a set of taxon sets S = {Y\, . . . , Y^}, one can attach to the root of each 
input tree T; on Y ; a pending edge and label the resulting leaf, say, x p . Ac- 
cordingly, for all i = l,...,k, we define Y, := Y, U {x p } and X := X U {x p }. 
Then a rooted triple {a, b,c} C X can be regarded as unrooted quadruple 
{xp, a, b, c} c X. By Theorem|2j one can find an alternative approach to prove 
TheorerrjlJ It can be easily checked that S : = {Y\, . . . , Y^} fulfills the four-way 
partition property if and only if there is no cross triple of S with respect to 

Yi|y 2 | ... iv*. 

Next we state an additional definition to simplify the terminology for the 
subsequent proofs. 

Definition 7 Let X be a taxon set and S = {Y\, . . . , Y^} be a set of subsets 
of X. Then, a quadruple {a, b, c,d} C X is called a global cross quadruple or 
just cross quadruple for short if it is a cross quadruple with respect to n = 
Yi\... \ Y k , i.e. if {a, b, c, d} £ Yj for all i = 1, . . . , k. 

Before deriving a new characterization of phylogenetic decisiveness for 
the unrooted case, we show that Theorem [T] cannot be generalized to this 
case by just translating triples to quadruples. For this example, we use the 
characterization provided by Theorem [2] 
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Example! Let X = {1,2,3,4,5}. Then, the set S := {{1,2,3,4},{1,2,4,5}, 
{1,3,4,5}, {2,3,4,5}} is phylogenetically decisive. This can be verified by 
Theorem[2]as follows: four out of the five possible quadruples form elements 
of S and are therefore resolved by all possible input tree collections. So the 
four-way partition property is fulfilled for all partitions n = Xj | X 2 1 X3 1 X4 of 
X which split any of these quadruples apart. However, the only CQ, namely 
{1,2,3,5} could be critical: what if this quadruple gets split apart by a 4- 
partition? For this case, we need to verify the 4-way partition property. Wlog. 
1 6 X x , 2 6 X 2 , 3 6 X 3 , 5 6 X 4 . It can be easily verified that for all i = 1, 2, 3, 4, 
if taxon 4 is in X,, the 4-way partition property holds and therefore, S is phy- 
logenetically decisive even though there is a cross quadruple. Thus, in the 
unrooted case there may be CQs even in phylogenetically decisive sets of 
taxon sets. 

The above example shows that the unrooted case is more complicated 
than the rooted case. In the rooted case, phylogenetic decisiveness is equiva- 
lent to the absence of cross triples, but in the unrooted case, cross quadruples 
do not necessarily destroy the phylogenetic decisiveness. However, this is 
only true if the cross quadruples have a certain property. Informally speak- 
ing, a cross quadruple is a quadruple unresolved by the input trees - so in 
order for it to be resolved uniquely by all possible supertrees, there must be 
additional information on the quadruple somewhere in the input sets. There- 
fore, we now introduce the notion of so-called fixing taxa. 

Definition 8 Let X be a set of taxa and S = \Y\, . . . , Y/ c } be a set of subsets 
of X. Let {a,b,c,d} be a global CQ. Then, taxon x 6 X \ {a,b,c,d} is called 
& fixing taxon of {a,b,c,d}, if for each of the four sets {a,b,c,x}, {a,b,d,x}, 
{a, c, d, x} and {b, c, d, x} there exists a j 6 {1, . . . ,k} such that this set is con- 
tained in Yj, respectively. 

We now show the role of fixing taxa in resolving cross quadruples. 

Proposition 5 Let Xbea set of taxa and S = {Y\, . . . , Yj.} be a set of subsets ofX. 
Let {a, b, c, d} be a global CQ of S with fixing taxon x 6 X. Then, any assignment 
of compatible trees T\,..., T/ c on the taxon sets Y\,...,Y^ resolves {a, b, c, d} in a 
unique way, i.e. for all pairs of supertrees T , Tof T\,..., T; c , we have: T\ { a ,b,c,d} = 

i\{a,b,c4}- 

Proof Let {a, b, c, d} be a global CQ of S with fixing taxon x £ X. Then, for 
any assignment of compatible trees T\, . . . , T\ on the taxon sets Y\, . . . , Y\, the 
sets {a,b,c,x}, {a,b,d,x}, {a,c,d,x} and {b,c,d,x} are all resolved by the 
definition of a fixing taxon. In the following, a quartet tree on taxa {w,x,y,z} 
inducing the split wx\yz is identified with this split in order to simplify the 
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notation. Then, it can then be easily checked that given a compatible collec- 
tion Q of quartet splits, another quartet split wx\yz £ Q may be implied (cf. 
quartet rules in |Semple and Steel| ( [2003[ , p. 129 as well as in |Steel| ( |T992) ). We 
now examine each possible input quartet split of the resolved sets {a, b, c,x}, 
{a,b,d,x}, {a,c,d,x} and {b,c,d,x} to see a) if the respective set is compatible 
and b) if yes, if and how it resolves the CQ {a, b, c, d] : 



1. {ab\cx , ab\dx} ab\cd 
1. {ab\cx,ad\bx} =>■ ad\bc 

3. {ab\cx , ax\bd} =>■ ac\bd 

4. {ac\bx,ab\dx} =>■ ac\bd 

5. {ac\bx,ad\bx,ac\dx} ^ ac\bd 

6. {ac\bx,ad\bx,ad\cx} ^ ad\bc 

7. {ac\bx,ad\bx,ax\cd} incompatible 

8. {ac\bx,ax\bd} =>■ ac\bd 

9. {ax\bc , ab\dx} =>■ ad\bc 

10. {ax\bc,ad\bx} =>■ ad\bc 

11. {ax\bc,ax\bd,ac\dx} incompatible 

12. {ax\bc,ax\bd,ad\cx} incompatible 

13. {ax\bc,ax\bd,ax\cd,bc\dx} =>■ ad\bc 

14. {ax\bc , ax\bd, ax\cd,bd\cx} ac|M 

15. {ax\bc,ax\bd,ax\cd,bx\cd} => ab\cd 



So in all cases where the input quartets are compatible, the CQ {a, b, c, d} 
is uniquely resolved. This completes the proof. 



□ 



Remark 2 Note that in 13 - 15, the first three quartets are not enough to resolve 
the CQ {a, b, c, d} uniquely, so the fixing taxon of all four subtriples of the CQ 
is needed. 

Example 3 (Example^continued.) In Exam ple p] taxon 4 is a fixing taxon of the 
only CQ {1, 2, 3, 5}. Thus, by Proposition B^Tthis CQ is resolved in the same 
way in all possible supertrees of the trees corresponding to the taxon sets in 
S. 



However, the existence of a fixing taxon is sufficient but not necessary 
for a cross quadruple to be resolved. This is demonstrated by the following 
example. 



Example^ LetX = {1,2,3,4,5,6} and <S := {{1,2,3,5}, {1,2,4,5}, {1,2,4,6}, 
{1,2,5,6}, {1,2,3,6}, {1,3,4,6}, {1,3,5,6}, {1,4,5,6}, {2,3,4,5}, {2,3,4,6}, 
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{2, 3, 5, 6}}. In this case, S has four CQs: {1, 2, 3,4}, {1, 3,4, 5}, {2, 4, 5, 6} and 
{3,4,5,6}. It can be checked (calculation not shown) that S is phylogeneti- 



cally decisive by verifying the four-way partition property (see Theorem 



explicitely for all possible partitions of X. However, while the CQs {1,2,3,4} 
and {2,4,5,6} have fixing taxa 6 and 1, respectively, the other two CQs do 
not have any fixing taxa. 

Note that cross quadruples are by definition sets of four taxa which are 
not resolved by any input tree, but by Proposition a cross quadruple is re- 
solved by a supertree of the input trees if it has a fixing taxon. The main idea 
we present below in Theorem [3] is that cross quadruples can be iteratively re- 
solved: if a cross quadruple has no fixing taxon, but, say, taxon x would be 
a fixing taxon if another quadruple was resolved, which in turn does have a 
fixing taxon, then the original cross quadruple can 'inherit' the resolvedness 
by resolving the second quadruple first. Thus, we need to distinguish directly 
and indirectly resolved cross quadruples. 

Definition 9 Let X be a set of taxa and S = {Y\, . . . , Y/ c } be a set of subsets of 
X. Let {a, b, c, d} be a global CQ of S. Then, we call {a, b, c, d} directly resolved, 
if it has a fixing taxon. We call {a, b, c, d} indirectly resolved, if there is a taxon 
x G X such that for each the four sets {a,b,c,x}, {a,b,d,x}, {a,c,d,x} and 
{b,c,d,x} one of the following conditions holds: 

1. The set is not a cross quadruple. 

2. The set is a cross quadruple but has a fixing point. 

3. The set is a cross quadruple and is itself indirectly resolved. 

We now state some helpful characteristics of CQs of phylogenetically deci- 
sive sets of taxon sets in order to derive a characterization of phylogenetically 
decisive sets in the unrooted case. 

Proposition 6 Let X = {1, . . . ,n} be a set of at least four taxa and S = {Y\, . . . ,Yj ( } 
be a phylogenetically decisive set of subsets ofX. Then each triple {x,y,z} C X is 
contained in at least one Yj with \ Yj\ > 4. 

Proof Let S = {Y\, . . . , Y^} be phylogenetically decisive. Assume there exist 
three taxa x,y,z G X such that for all j with |Y,| > 4: {x,y,z} £ Yj. Let Xj := 



{x}, X 2 := {y}, X 3 := {z} and X 4 := X\{x,y,z}. Then, n := Xi|X 2 |X 3 |X 4 



partitions X into four non-empty subsets. Now assume there is no taxon a in 
X - and thus neither in X4 - such that {x,y,z,a} C Yj. Then, the four- way 
partition property fails for n. Thus, S is not phylogenetically decisive. This is 
a contradiction and thus completes the proof. 



□ 
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Definition 10 Let X be a set of taxa and S = {Y\, . . . , Y^} be a set of subsets 
of X. Then, the colored 3-overlap graph of S is a graph where the nodes are all 
possible quadruples in( 4 )={Y:YCX and \Y\ = 4}, and where two nodes 
are connected with an edge if the corresponding quadruples share three taxa. 
Moreover, all nodes are either colored red or green: they are green if they 
are contained in at least one Yj and red otherwise (note: the red ones are the 
global cross quadruples). 

Example 5 We continue ExampleE] X = { 1, 2, 3, 4, 5, 6} and S : = { { 1, 2, 3, 5} , 
{1,2,4,5}, {1,2,4,6}, {1,2,5,6}, {1,2,3,6}, {1,3,4,6}, {1,3,5,6}, {1,4,5,6}, 
{2, 3, 4, 5}, {2, 3, 4, 6}, {2, 3, 5, 6}}. As stated before, S has four CQs: {1, 2, 3, 4}, 
{ 1 , 3, 4, 5 } , { 2, 4, 5,6} and { 3, 4, 5, 6 } . We construct the colored 3-overlap graph 
as depicted by Figure [4] Note that the CQs are all red. 




Fig. 4: The colored 3-overlap graph. 

We are now in a position to prove a characterization for resolved global 
cross quadruples. 
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Theorem 3 Let X be a set of taxa and S — {Y\, . . . , Yj-} be a set of subsets of 
X. Let {a,b,c,d} be a global CQ of S. Then, {a,b,c,d} is uniquely resolved by 
all supertrees of compatible input trees T\,...,T k on Y\, . . . , Yj- if and only if it is 
directly or indirectly resolved. 



Proof 

- Assume {a,b,c,d} is directly or indirectly resolved. If it is directly re- 
solved, it has a fixing taxon and thus, by Proposition |3.2| there remains 



nothing to show. If it is indirectly resolved, one can apply Proposition 3.2 
iteratively in order to derive a chain of resolved quadruples (starting with 
one that has a fixing taxon and contains a taxon which acts as a fixing 
taxon for another cross quadruple, and so on) until {a, b, c, d} is resolved. 
- Assume {a, b, c, d} is a global CQ of S but is uniquely resolved by all su- 
pertrees for all possible combinations of compatible input trees T%, . . . , T\ 
on the taxon sets Yi,...,Y/ c . Assume {a,b,c,d} is not directly or indi- 
rectly resolved. Then there is no fixing taxon resolving {a,b,c,d} or other 
cross quadruples which would in turn deliver a fixing taxon for {a,b,c,d}. 
Therefore, for all x G X, at most three of the sets {a,b,c,x}, {a,b,d,x}, 
{a,c,d,x} and {b,c,d,x} in the colored 3-overlap graph are either green 
or red but directly or indirectly resolved. This implies that for all x G X, 
at least one of the sets {a,b,c,x}, {a,b,d,x}, {a,c,d,x} and {b,c,d,x} is 
unresolved. 

Then, all of these four quadruples which contain three taxa of {a, b, c, d} 
and some taxon x G X also contain at least one more taxon which lies 
in all of them: e.g. if {b,c,d,x} is not resolved, but {a,b,d,x}, {a,c,d,x} 
and {a,b,d,x} are resolved, the intersection of the resolved sets contains 
both x and a. In other words, considering the four quadruples {a,b,c,x}, 
{a,b,d,x}, {a,c,d,x} and {b,c,d,x} and assuming that at least one of 
them is not directly or indirectly resolved, the intersection of the resolved 
quadruples has at least cardinality 2. So assume for some x G X without 
loss of generality that the intersection of the three sets contains x and a. 
We now assume for some x G X without loss of generality that only the 
sets {a, b, c, x}, {a, b, d, x} and {a, c, d, x} (or just one or two of them) are 
either green or red but directly or indirectly resolved. Note that if they 
are green, they are contained in one of the input sets Y; 7 and if they are 
red but directly or indirectly resolved, all possible compatible choices of 
T\, . . . , Tjc lead to a unique resolution of them by the first part of the proof. 
Thus, choose T\, . . . , Tj. such that they are compatible trees on the taxon 
sets Y\, . . . , Yfc and such that all their supertrees display the mentioned 
sets as follows: ax\bc,ax\bd and ax\cd. For example, we can construct the 
input trees such that x and a are put on a cherry. Then, the resolutions 
ab\cd, ac\bd and ad\bc are all possible in supertrees of T\, . . . , T^. Therefore, 
if T is a supertree of T\, . . . , T k such that T\ { a ,b,c,d} = ab\cd, then also T is 
a supertree of T\, . . . , T\, where we define / as follows: / | { a ,b,c,d} = ac\bd 
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and T\x\{ a ,b,c,d} = 7~\x\{a,b,c,d}- Thus, {a,b,c,d} is not resolved for this 
choice of T\, . . . , T\. This is a contradiction and thus completes the proof. 

□ 



Next we state a straightforward characterization of phylogenetic decisive- 
ness, before we combine this characterization with the previous theorem to 
obtain Main Theorem 2. 

Proposition 7 Let X be a set of taxa and S = {Y\, ... , Y k } be a set of subsets of 
X. Then, S is phylogenetically decisive if and only if all global cross quadruples are 
directly or indirectly resolved. 

Proof 

1. Assume all global cross quadruples of S are resolved. Assume S is not 
phylogenetically decisive. Then there exist compatible trees T\, . . . , Tj. on 
Y\, . . . , Yk, such that there are two different trees Ti and T2 that display 
T\, . . . , Tfc. As T\ and 7jj are different, there is at least one quadruple 
{a,b,c,d} which is resolved differently in T\ and Ti- Then, in particular 
{a,b,c,d} ^ Yj for all 7 — 1, . . . , k, and thus, by definition, {a,b, c, d} is a 
global cross quadruple. Thus, {a, b, c, d} must be resolved for all choices 
of input trees T\, . . . , T^. This contradicts the fact that T\ and 7i are su- 
pertrees of Ti, . . . , but display conflicting resolutions of {a, b,c,d}. This 
completes the proof. 

2. Let X be a set of taxa and S = {Y\, . . . , Y^} be a phylogenetically deci- 
sive set of subsets of X. Then, if there are no unresolved cross quadruples, 
there is nothing to show. So assume there exists a global cross quadru- 
ple {a, b, c, d} which is not resolved. Then, there exists a choice of trees 
Ti, . . . , Tj; on taxon sets Y\, . . . , Y\ such that there are at least two supertrees 
Ti and T2 displaying two different resolutions of the quadruple {a,b,c,d}, 
wlog. 71| { a ,b,c,d} = a b\cd and T2 1 { a ,fo rC ,d} = «c|bd. So clearly, Ti ^ Ti, but 
both are supertrees of Ti,...,T k . This contradicts the phylogenetic deci- 
siveness of S and thus completes the proof. 

□ 



We are now in a position to state Main Theorem 2, which shows that even 
if the setting for unrooted trees is more complicated than the one for rooted 
trees, the decision whether or not a given set of taxon sets is phylogenetically 
decisive can still be made in polynomial time. 
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Theorem 4 (Main theorem 2) Given a set S — {Y\,. ■ ■ ,Y^} of subsets of X, 
where \X\ = n,the question whether S is phylo genetically decisive can be answered 
in at most 0(k ■ n 16 ) steps. In particular, as long as k is polynomial in n, the question 
can be answered in polynomial time. 

Proof (Main Theorem 2) Let S = {Y\, . . . , Y^} be a set of subsets of X, where 
|X| = n. We want to decide if S is phylogenetically decisive. 

1. We start by generating the colored 3-overlap graph of the quadruples as 
follows: For each quadruple {a, b, c, d} £ (^), we draw a node and color 
it green if there is a j E {1, . . . ,k} such that {a, b, c, d} C Y; and red else 

(Note: This step is bounded by k ■ < k ■ n 4 ). We then draw the edges 
of the graph as follows: For any two nodes, we connect them if and only 
if they share three taxa (Note: This step is bounded by ('?') ■ ((' f ') — 1) < 
n 8 ). 

2. All red nodes are global CQs of S (Note: Their number is bounded by 

('4') < m 4 ). They need to be checked for fixing taxa. So as long as there are 
red nodes, check for each CQ {a,b,c,d} if it has a fixing taxon, i.e. if there 
is a taxon x £ X such that {a, b,c,x}, {a, b, d, x}, {a, c, d,x} and {b, c,d,x} 
are all green (Note: This step cannot take more than ■ 4 < 4n 4 steps, 
so the entire order of this step is bounded by n 4 ■ 4m 4 , so that this step can 
be done in 0(n 16 )). If there are red nodes but none of them has a fixing 
taxon, STOP: S is not phylogenetically decisive. For all red nodes with a 
fixing taxon, change their color to green. Repeat until there are only green 
nodes left. In this case, S is phylogenetically decisive. 

Clearly, this algorithm has a polynomial running time as long as k is not 
exponential in n and it terminates once all nodes are colored green or no 
more red node has a fixing taxon. The correctness of the algorithm follows 
from Proposition|7|and Theorem|3] 

□ 



Informally speaking, the algorithm presented in the proof of Main Theo- 
rem 2 works as follows: If there are no cross quadruples, all nodes in the col- 
ored 3-overlap graph are green and S is phylogenetically decisive. However, 
if there are red nodes, i.e. there are cross quadruples, the algorithm checks 
if they can be resolved. Each cross quadruple with a fixing taxon can be re- 
solved and thus is colored green. These newly green-colored nodes might 
help to deliver a fixing taxon to a previously unresolved node in the graph. 
So the red neighbors of a newly colored node need to be checked again for 
being resolvable. Once nothing can be recolored, the algorithm stops. If then 
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everything is green, S is phylogenetically decisive, else not. We now demon- 
strate the algorithm with a short example. 

Example 6 We continue Examples|4]and|5] The colored 3-overlap graph is de- 
picted by Figure [4] Now in the first iteration, we check all red nodes for fix- 
ing taxa. It turns out that the CQ {1,2,3,4} has fixing taxon 6 and {2,4,5,6} 
has fixing taxon 1 . Therefore, these two nodes change their color from red to 
green, see Figure[5] Now for the second iteration, there are still two red nodes, 
namely { 1, 3, 4, 5}and {3, 4, 5, 6} . We check the graph and find that since now 
{1,2,3,4} is green and {2,3,4,5}, {1,2,3,5} and {1,2,4,5} were green right 
from the beginning, now taxon 2 acts as a fixing taxon for {1,3,4,5}. Thus, 
we can change the color from {1,3,4,5} from red to green. Similarly 2 now 
also acts as a fixing taxon for {3, 4, 5, 6}, as in the previous iteration {2, 4, 5, 6} 
turned green. So we can now change the color of {3, 4, 5, 6} from red to green. 
Now there are only green nodes left as shown in Figure |6j which means that 
S is phylogenetically decisive. This verifies the above result derived with the 
help of the four-way partition property by Steel and Sanderson (J2010 1. 




Fig. 5: The colored 3-overlap graph after the first iteration. 
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Fig. 6: Hie colored 3-overlap graph after the second iteration: all nodes are green, so S is phylogenetically 
decisive. 



4 Discussion 

Deriving new information on taxa by combining various compatible trees on 
different taxon sets into one common supertree is a challenging task, even if 
the input taxon sets are compatible with one another. It is therefore under- 
standable that phylogeneticists seek to understand a priori if a particular set 
of input trees has the potential to deliver new information - in the perfect 
case even to provide a unique supertree. In the perfect case, the set of input 
taxon sets is called phylogenetically decisive. 

In our paper, we provide algorithms (both for the rooted and unrooted 
tree case) to determine whether a set of taxon sets is phylogenetically deci- 
sive, and the algorithms are polynomial as long as the number of taxon sets 
under investigation are polynomial in the total number of taxa. Our algo- 
rithms are quite intuitive: They check for the smalles possible information 
unit of each setting (i.e. triples in the rooted case and quadruples in the un- 
rooted case), if all such instances are uniquely resolved by all supertrees. If all 
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such small units are uniquely resolved by all supertrees, the supertree itself 
must be unique. However, this stepwise approach in both algorithms makes 
them useful even if the underlying set of taxon sets turns out not to be phy- 
logenetically decisive. For instance, particularly relevant quadruples can be 
investigated on their own - a quadruple can be uniquely resolved by all su- 
pertrees, even if the supertree is not unique, and this can be checked with our 
algorithm. Or, on the other hand, using our approach, one can determine the 
quadruples which cause a set of taxon sets to not be phylogenetically deci- 
sive and thus decide whether these taxa should be excluded from the study. 
Thus, our approach is not only suitable to determine whether a set of taxa 
is perfectly sampled, but also which taxa are problematic. However, we are 
aware that the algorithms as stated here are not runtime optimized and thus 
can probably be improved. This would be interesting for future work. 

input taxon sets which have the property that all possible compatible in- 
put trees chosen for the input sets lead to a unique supertree. This is an in- 
teresting setting, as it shows that the decision which taxa to sample for each 
input tree may already ensure that the supertree of all input trees is unique. 
In this context, a set of taxon set which is phylogenetically decisive can also 
be referred to as a set of perfect taxon samples. However, the complexity of 
deciding whether or not a given set of taxon sets is phylogenetically decisive 
remained unknown. 
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